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ABSTRACT 

The robustness of "Student's" t-test with regard to the 
assumption of normality is investigated. This is accomplished 
by empirically developing the distribution of a new statistic 
from a large number of samples of varying sizes from the Beta 
and the Gamma distributions using different values of the 
parameters. The various significance levels of this new 
distribution are then compared with the corresponding signifi- 
cance levels from "Student's" t distribution using an IBM 360 
computer. A comparison of the distribution frequencies with 


the standard normal distribution frequencies is also presented. 
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I. INTRODUCTION 

In practical work, an Operations Research Specialist is 
generally involved with the problems of bringing the theoret- 
ical structures into some degree of correspondence with the 
Situations of practical experience. One of Ele common methods 
of achieving this objective is by hypothesis testing. The 
hypothesis testing method makes use of a variety of statisti- 
cal techniques and procedures involving both parametric and 
non-parametric tests. In the derivation of most of the para- 
metric tests, it is usual to assume a form of mathematical 
model involving some specific probability distribution and 
then to select some statistical criterion that is sensitive to 
change in the specific factors tested. This type of criterion 
is generally known as the "power of the test". 

However, another desirable requirement of any statistical 
test is whet it be insensitive to change in the underlying 
assumptions. This property of the test is generally referred 
to as the "robustness" of the test. Common questions probing 
into the robustness of a test are: When the actual distribu- 
tion is not known, which statistical tests can be used with 
less hesitation?; and What percentage of error would be 
incurred if the underlying assumptions are violated? Past 
research and subsequent literature indicate that the property 
of robustness is generally not satisfied by the parametric 
tests. This is substantiated by research conducted by G.E.P. 
Box [7] and R. C. Geary [8]. 

One of the most common parametric tests used for hypothesis 
testing is "Student's'’t-test. This test was developed by 


=) 








William Sealy Gosset in 1908 when he was working for Guinness 
Brewery in Dublin, Ireland. The discovery of the t-test was 

a direct result of the peculiar problems of brewing with its 
variable material and its susceptibility to temperature 
changes. A number of experiments pe oueten at the brewery 
emphasized the limitations of large sample theory and revealed 
the necessity for a correct method of treating small samples. 
It was, therefore, the circumstances of Gosset's work which 
led to his discovery of the Theory of Errors and the distri- 
bution of the Sample Standard deviation, which was developed 


later into the well known t-test. 








II. "STUDENT'S" t DISTRIBUTION 

AS was mentioned earlier, the Theory of Errors finds its 
Origin in the fact that the accuracy of the mean of a number 
of observations may be estimated from the discrepancies observed 
among the individual values used in obtaining the mean. When 
the samples are drawn from a normal population, having a 
variance (mean squared deviation) equal to é , it can be 
shown that the mean of N observations xX is also normally dis- 


2 


tributed with variance 6 = 5 . Thus if the variance of the 


mean is known, then the desired information regarding the 
distribution of the mean can be easily determined. In order 
to test a hypothesis that the population has mean u , one 


would merely need to calculate t = oii and then the integral 


co -t? 


r= 1 | e 2 dt would give the probability that a more 
¥27T 





discrepant value would occur. If the value of I so calculated 
turns out to be a small quantity, such as .01, one should con- 
clude with some confidence that the hypothesis was noe: in 
fact, true of the population sampled. 

In a majority of the cases in which such tests are requir- 
ed, one can find no prior knowledge of the variance of the 
population. The variance of the population can be estimated, 


however, from the sample itself by the following relationships: 


N 
® 2 —_ 
Et X,/X%5,-+--,Xy represents a sample; then s* = aT Poe Xx) 
e e e 2 _—_ 1 N e 
1s an estimate of the unknown variance o , and x = . ie xX; 1s 
i=l 


the sample mean and an estimate of the population mean. 
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W. S. Gosset, in his fundamental paper of 1908, showed 
that even though Ss as calculated above is a good estimate 
of o , it would be erroneous to assume that the statistic 


t = X-H would be normally distributed or that the Signifi- 
Ai z : 


cance of an observation could be accurately tested by uSing the 
normal probability theory. Assuming the independence of g@ 
and x , Gosset was able to develop the exact distribution of 
this statistic, which in its modern form is known as "Student's" 
t distribution. It states that if X and U are two independently 
distributed random variables such that X is normally distributed 
with mean wu and variance g , and U has chi-square distribution 


with N degrees of freedom, then the ratio t = a /VU/N has 


the distribution of "Student's" t, and its density function is 


given by 
r Gees 1 | 
£(t) = ———"__—-. ———7 — ae where -~<t<o 
N t oe 
YTN I (>) fl + =! om 


Prom this definition, it can be seen that “Student's. 
distribution depends on N only and is independent of both 
parameters of the normal distribution sampled. It is further 
noted that the derivation of "Student's" t distribution 
depends on the fact that the random variable X be drawn from 
a normal population. 

One of the common uses of "Student's" t-test is to compare 
the mean of a population with some standard. In practical 


problems, however, it is often the case that the distribution 








of the population being sampled is not known or is not normal. 
It is therefore desirable that some measure of the degree of 
error be obtained for "Student's' t-test when the assumption 
of normality of the sample is violated. 

Literature reveals that a great deal of research has been 
directed towards many aspects of "Student's" distribution. In 
1925, R. A. Fisher very stringently verified the independence 
of the sample mean x and ine sample variance s¢. This was 
one of the basic assumptions in the development of "Student's" 
distribution [2]. In 1936, R. C. Geary investigated the 
distribution of "Student's" t ratio for the samples drawn from 
a Slightly asymmetrical universe by developing asymtotic formu- 
lae for the moments of the t distribution. He showed that for 
symmetrical, non-normal populations, "Student's" distribution 
gives more accurate results than skewed sampling distributions. 
(4) In 1948, A. K. Gayen, using the moments techniques, 
studied the behavior of "Student's" t and obtained results 
Similar to those obtained by R. C. Geary in 1936 [5]. In more 
recent work, Cucconi 0 developed a simple relation between the 
critical values of "Student's" t and the degrees of freedom. 
Mareugh this relation, in addition to deriving the critical 
values of t for any number of degrees of freedom, he showed 
that it is possible to substitute "Student's" criteria in the 
verification of statistical hypothesis more easily than using 
a standardized normal deviation [9]. These are but a few 


examples of the diversified attention given to "Student's" 








t-test. The aim of this paper is to investigate the robustness 
of "Student's" t-test when the sampling data is drawn from 


the Beta and the Gamma distributions. 
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Tit. SMETHORS SAND PROCEDURE. 

In order to investigate the robustness of t-test, it was 
decided to select two well known distributions with a wide 
scope of application in the real world. The two distributions 
selected are the Gamma and Beta distributions. A brief 


description of these distributions follows. 


A. GAMMA DISTRIBUTION 
A continuous random variable Y is said to have the Gamma 


distribution if its density is 


- 


= iL ye 
T (a) B® 


woh 


Ey) 


| 
@ 


. O0<y<oa 


= 0 , otherwise 
It should be noted that this is a two parameter family of dis- 
tributions with parameters a and 8. The distribution is only 
defined when the value of sel and the value of B>0. The 
mean and variance of this distribution are respectively af 
and 0B. The Gamma distribution is frequently applied to the 
solution of the queuing and inventory control problems. A 
graphical representation of the Gamma distribution for various 
values of its parameters is given in Appendix A. 

One of the special and most often used family of Gamma 
distributions is called the Erlang distribution. In order to 
obtain this distribution, the values of the parameter a in the 
Gamma distribution are restricted to positive integer values 


and this positive integer is denoted by K. If one replaces 


ie 








8B by = , then the distribution becomes 


LO ee -ry 
sa] forey-O 
(K-1) 1 Yo 


f(y) 


= 0 , otherwise 
It is possible to show that the latter f(y) can be generated 


by the sum of K variates Z having the exponential density 


“XZ 


zy) Ae when z>0 


= 0 , otherwise 
It should be noted that when K = 1, we have the exponential 
density as a special case. The mean, variance and mode of the 


K K eal 
Erlang distribution are ), ,2 and respectively. The great 


usefulness of this distribution stems from the fact that it is 

a large family of distributions permitting only non-negative 
wemues, The waiting time and the service time distributions 

in the queuing theory can, therefore, be reasonably approximated 


by an Erlang distribution. 


Bp. THE BETA DISTRIBUTION 
A continuous random variable Z is said to have a Beta 


distribution when 


Il 

= 
Q 
te 
WD 


PE Uz) Zot (y-2) 8-1 when 0<z<l 


= 0 , otherwise 
This density is a function of the two parameters a and 8, both 


of which are positive constants. The mean and variance of 


PZ 








this distribution are given by —- nd __fotl) (841) ‘ 


Bs a tener ren 
graphical representation of the Beta distribution for various 
values of the parameters in furnished in Appendix B. 

It may be pointed out that when a=8=0 the distribution 
becomes the uniform distribution over the unit interval. The 
Beta distribution, though in general defined over the unit 
interval, can also be defined over any interval (a, b) as 


follows 


£467 ) 


I 


E (z-a)*(b-z) ® where a<z<b 


0 , otherwise 


b 
where & = | (z-a) “(b-z) 8az 
a 


Primarily because the Beta distribution is defined on a 
finite interval, it is eoeih, used in the study of critical 
path scheduling techniques such as PERT. It may be recalled 
that PERT is a method of reflecting the uncertainties assoc- 
tated with development-type tasks, and its use generally 
results in providing a more realistic outlook for task accom- 
plishment than the conventional systems provided in the past. 
In order to obtain the critical path through the network, it 
is necessary that a distribution defined between the pessimis-— 
tic and optimistic time estimates adequately represent the 
distribution of time required to perform an activity. Since 
the Beta distribution restricts the probability distribution 
to a finite range for varying values of the parameters, it is 


widely used in the critical path scheduling. 
13 








e. GENERATION OF DISTRIBUTIONS 

The inverse probability integral transformation technique 
is used to produce random variates having the exponential 
distribution. According to this technique, the cumulative 
G@aestribution function for the Maron probasaney density 
function on the interval (0,1) 1s equated to the cumulative 
distribution function for the desired probability density 
function in order to obtain the necessary transformation. 
Thus if RN denotes a random variable on the interval (0,1), 
and A is the desired parameter of the exponential distribution; 
then the random variable obtained from the transformation 


X = = inten) is exponentially distributed with a mean u = 


>| 


ee Erlang Distribution 


Bieo ret eongencrote Ene Prlang di Siterd bute ron eal 
extension of the above procedure is utilized by first generat- 
ing exponential random variables with mean 1 . Then K of 


these exponential variates are summed using the relationship 


N en(RN) ; | 

a), a to produce the desired Erlang distribution 
=) ~ 

with mean = - and variance x5 . 


2. Beta Distribution 
The Beta distribution is generated using the following 
theoretical ideas. Consider two random samples Xp Xgree eX 


and YyrYore eer, such that the random variables X and Y are 


identically distributed exponential with parameter A . If we 
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n 
Foun tne. Gtacliorercs fi) Xs and Zo= 
1=1 Pt 


and their joint density function is given by 


Z +25 


zim td 
g(Z),2.) 7 OR <Orand 0625<% 


lr (n)P (m) pe*n 





Z 
Now using the transformation W, = - and W. = Z,.+Z., the 
1 7 +7, 2 Lo 2 
ae, 
joint density of W, and Ww. is given by: 
W 
2 
h (w] ,W5) a SAR eee 2 (LW) ) e Hu. Wo 


lr (n)T (m) phn 


Integrating over the domain of W., the marginal density of W); 


co 


h (wy ) = [ hw sw) dw 


Te. , h (wy ) = Ee | worm = dw. 
lr (n)T (m) yu 0 


Using the substitution U = Wo/u and dw. = uw du we get: 


II 


one as 
ana 1 1 n+m-1.-u 
( 1? Sait ay [» e~Udu 


Tr (n+m) Paw 


T (n)T (m) 
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We observe that W, = a has. the Béta distribucrvon 
i Y,+¥, 


with a=n, B=m, and this Beta distribution is independent of 
the parameter of the exponential distribution provided that 
the two Gamma distributions are generated by random numbers 
having identical exponential distributions. Thus to generate 


the Beta distribution, two Gamma distributions with parameters 


(Ky >) and (Ko 15) are generated using the above techniques. 








Y 
From these distributions, the ratio W, = 1 is formed to 
. —— . Ky 
obtain a Beta distributed random variable with mean = aE 
1° “2 


and EEC 
(Ky) +K>) 2 (Ky +K 541) 
De SIMULATION TECHNIQUES 

The accuracy of generating random observations from a 
probability distribution depends upon the characteristics of 
the uniform random number generator. At least the superficial 
characteristics of the random number generator are ascertained 
by the use of chi-square test for goodness-of-fit at 95% and 
99% significance levels. For the Ehime scuaee test, the unit 
interval is divided into fifty equal intervals. Each generated 
random number is assigned to one of the fifty categories 
according to its size. From this data, a measure of the dis- 


crepancy existing between observed and expected frequencies is 


(o,-e;)2 


where oO. 
1 1 
e 


T 
obtained using the relationship x2 = ” 

i= ' 
i 


denotes the observed frequency of the ith category, and e. 
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denotes the expected or theoretical frequencies of this occur- 
ence; T denotes the total number of categories. The number of 
degrees of freedom for the statistics is equal to T-1l, where 
T is the number of categories or classes. 

For the purpose of this project, four simulated distribu- 
tions were obtained for each type of distribution sampled. 
Each simulation utilizes 5000 samples and a fixed sample size. 
In the discussion throughout this paper, sample size is referr- 
ed to as the number of data points within each sample used to 
calculate the t statistic and the number of samples denotes the 
number of replications for fixed values of the parameters with- 
in each cycle. In order to obtain different shapes of the 
distribution, five different sets of values of the parameters 
are then used within each simulation. 

Since the basic Pee ates and procedures employed in 
each of the four Simulations for the Erlang and the Beta 
distributions are essentially the same, it is considered 
adequate to discuss the methodology of one simulation for each 
distribution in detail. 

ieee lang Distribution 

Simulation 1 for the Erlang distribution consists 

of four steps. Step I consists of five cycles. For each 
cycle, 5000 samples of size 5 are independently generated 
from the Erlang distribution using the values of the parameters 
as shown in Table 1 below. For the purpose of discussion, 


these sampling distributions are referred to as y(K,dA) ~ 
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TABLE 1 





Cycle 2 Cycle 3 Cycle 4 Cycle 5 





















ii Volue of 
Porameters 







K =5 
hm = 3S XH =3 
3 


.6669 .7408 .8200 .8624 .9368 


. 










Standord 
Deviation 





- 


To illustrate the use of Table 1, 5000 samples of 
size 5 are generated during cycle 1 using parameters K =4, xX 
=3. The mean and standard deviation of this Erlang distri- 
bution are given in rows 3 and 4 of Table 1. During Cycle 2, 
another 5000 samples of size 5 are independently generated 
from the Erlang distribution using parameters K =5, i =3. The 
mean and standard deviation of this distribution are 1.667 
and .7408 respectively. These values are furnished ieee 5 
and 4 of Table 1 under the heading Cycle 2. The same process 
is repeated for Cycles 3, 4, and 5 using the appropriate 
parameters. 

Xx 


“saga —u 
In Step II, for each sample the statistic t = S/N 





is calculated where X is sample mean, s is sample standard 
deviation, N = sample size (5), and u is population mean (1.33). 
The absolute values of this statistic are then compared with 


the corresponding values from the "Student's" t-distribution 
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with N-1l degrees of freedom and at significance levels a=.8, 
w+ 2, «ly 2.055 202,.% Ol erand 300 ee ihescesa succor 
"Student's" t-distribution with varying degrees of freedom 
and significance levels are sanenaasmee nese to as the 
critical values t, of t-distribution and are the solutions to 


the equation 1-= [Fsceyat for various values of alpha. The 


—0o 
number of t statistics whose absolute values are greater than 
the corresponding critical values at each significance level 
are recorded for comparison purposes. 

In order to fully understand the comparisons between 
the actual significance levels and the significance vaéeie 
obtained from the Erlang distribution, it is necessary that 
some knowledge regarding the comparisons between the distri- 
bution frequencies be known. This is accomplished in Step III 


by transforming the total number of data points for the Erlang 


distribution in each cycle (5000 X 5 = 25,000) using the trans- 


X;-H 


0 
distribution mean and standard deviation respectively. The 


formation , 1 = 1,25000, where uw and o are the sampling 





transformed distribution is called y(0,1). It may be pointed 
out that if the sampling distribution were normal, with mean uy 
Le 


and standard deviation o then this transformation x* would 


reduce the sampling distribution to what is called Standard 
Normal, mean zero, variance unity. This is generally denoted 
by N(0,1). The transformed distribution, Y(0,1) is then 
divided into four intervals of width unity on either side of 


assumed mean of zero, and the frequency counts within each 


no 








interval are recorded. These counts are then compared with 
the expected number of frequency counts within each interval 
assuming that the sampling distribution was normal. 

In Step IV, the chi-square goodness-of-fit test, using 
a Significance level of 0.05, is conducted on the results of 
the observed significance levels and the distribution frequen- 
cies as calculated in Steps II and III above. This concludes 
Cycle 1 of Simulation 1. For Cycles 2, 3, 4, and 5, the entire 
procedure as outlined above is repeated by changing the para- 
meters of the Y (Ky) distribution in accordance with Table l. 

Simulations 2, 3, and 4 for the Erlang distribution 
use exactly the same techniques as outlined under Simulation 
1 except that the sample sizes are 10, 15, and 20 respective- 
—_— 

2. beta ater el ieeeatocel 

Simulation 1 for the Beta distribution is conducted 
in four steps. Step I consists of five cycles which are shown 
in Table 2. For each cycle, 5000 samples of size 5 are inde- 
pendently generated from each of the two Erlang distributions 
using fixed values of the parameters as shown in Table 2. 


These distributions are referred to as ¥, (Ky 1A) and Y, (Ki ,A)- 
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VB ee 
Cycle Number 


Values of 
Parameters 


yy ( Kr) 


Values o 
Porameters 


Yo (K,,A) 


To illustrate the use of Table 2, 5000 samples of 





~~ 


Size 5 are generated for y,(K,,A) distribution using the para- 
meters K, =4 and 4’ =3. Another 5000 samples of size 5 are 
independently generated for Y2(Kj,A) distribution using para- 
meters K. =4 and 4 =3 as shown in Table 2. During Cycle 25 
sew esamples of size 5 for each of the Y¥ 1 (Ky, 7A) and Y¥2 (Kg ,A) 
distributions are independently generated using the parameters 
K) =4, A =3 and K, =5, X\ =3 respectively. These values are 
found in Table 2 under the heading Cycle 2. A similar process 
is repeated for the remaining Cycles 3, 4, and 5. 

For each cycle, the Y7 (Ky .A) and V2 (KoA) distributions 
are combined using the relationship Y, (Ky A) / Cy, (Ky A) + 
Yo (K5,A)} to form five 5000 samples of size 5 for the Beta 


Nie 
and variance =. eee 
2 (K) +K>) (Ky tRoe) 


distribution having mean = 





1 
tk 


K 
The actual values of these calculations for each cycle are 
shown in Table 2. It should be apparent that each cycle 


generates 5000 samples of size 5 from a different Beta 


distribution. 
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The remaining Steps II, III, and IV of this simula- 
tion are identical with Steps II, III, and IV of Simulation 
tf under the Erlang distribution. Simulations 2, 3, and 4 for 
the Beta distribution use exactly the same techniques as 
peeiined in Simulation 1 except that the sample sizes are 10, 
15, and 20 respectively. 

In addition, the effects of changes in the variance 
of the Beta distribution, when the mean is kept constant, are 
also investigated. This is accomplished by manipulating the 
values of the ¥z (Ky A) and Y5 (Ky 7A) paeroraes that result 2 
Beta distributions having the same mean but different variances. 
The sets of values of these parameters used during this simu- 


iation are furnished in Table 3 below. 


TABLE 3 


cycie # 
Parameters of K, =2 K, =4 K =6 K, =8 K, =10 
| Pr AD » #3 rn =3 r 73 r 23 1 = 3 

|]Poraometers of Ko =2 Ko = § Ko =6 
Yo (K9 A) d 23 Mh =3 | rX #3 hoes PA =3 

moon oer | 5 | os | os | os | os” 
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LVS RESCULES 

The results of the various simulations for the Erlang and 
Beta distributions are furnished in tabular form in appendices 
Cc, D, E, F, G, and. H. Appendixse contains Tables .4 35 -4G, ane 
7. These tables contain the frequency comparisons of the dis 
tributions obtained in these simulations and from Standard Nor- 
mal designated N(0,1), when samples of size, 5, 10, 15, and 20 
are used respectively. Tables 8, 9, 10, and ll in Appendix D 
contain similar results obtained from the use of Beta distribu- 
tions. Appendix E contains Tables 12 and 13. These tables 
have the comparison of the frequency distributions from N(0,1) 
and the transformed Beta(0,1) when the mean of the distribution 
is kept constant, and its variance is manipulated. 

Appendices F, G, and H contain five tables each. These 
tables provide the necessary comparisons between the actual 
and observed significance levels for a fixed mean and 4, 9, 14, 
and 19 degrees of freedom. Tables 14, 15, 16, 17, and 18 in 
Appendix F pertain to Erlang distribution and contain the 
comparisons of the actual and the observed significance levels 
when samples of size 5, 10, 15, and 20 are used respectively. 
Tables 19, 20, 21, 22, and 23 in Appendix G contain similar 
meets for the Beta distributions. Tables 24, 25, 26 2/7, 
and 28 in Appendix H have the comparisons of the significance 
levels when the variznnce of the Beta distribution is manipu- 
lated for a fixed mean. 

Since there are but two different table formats used to 
illustrate the results, it is considered appropriate to discuss 


the use of only one table of each type before proceeding with 
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the discussion of the results. Tables of the first type, 
which give frequency comparisons, are used in Appendices C, 
D, and E. The second type of table is used to provide the 
comparisons of sigificance levels, and these tables are 
contained in Appendices F, G, and H. 

As mentioned earlier, Table 4 pertains to the Erlang 
distribution. This table is divided into two parts. Part I 
furnishes the means and the standard deviations of the sampling 
aeaseributions when the samples of size 5 are used. The means 
and standard deviations for each distribution are identified 
by means of index numbers. For example, Index (1) pertains 
to the Erlang distribution with mean 1.33 and standard devia- 
tion .6669. Part II of this table furnishes the frequency 
distributions of each Erlang given in Part I and the frequency 
distributions of N(0,l). In order to relate the mean and 
variance with the corresponding frequency distributions, the 
appropriate index should be consulted. For example, if one 
wishes to compare the frequency distribution of an Erlang with 
mean 2.0 and standard deviation .82, when using samples of 
size 5; one should consult Appendix C, Table 4. Corresponding 
to the above values, one would select Index 3 from Part I of 
Table 4 and obtain the frequency distribution from the row (3) 
in Part II of this table. In order to compare this frequency 
distribution with N(0,1), one would compare the corresponding 
values of row 3 with the row marked N(o,1). The asterisks on 


the distribution frequencies indicate that the distribution 
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was found to be significantly different than N(0,1) by chi- 
Square goodness-of-fit test at 95% confidence level. 

The second type of table is found in Appendices F, G, and 
H. Table 15 in Appendix F pertains to the Erlang distribution 
with a fixed mean of 1.667. This table provides the comparison 
of the actual significance levels and the observed significance 
levels when the sampling population is Erlang with mean 1.667. 
This table furnishes these comparisons for four values of de- 
@rees of freedom. These values are 4, 9, 14, and 19; they 
are found in the first _ column of the table. The format of this 
table is very Similar to the standard t-table found in most 
text books. In order to make use of this table, one is re- 
quired to know the sample size and the population mean. For 
example, if one is interested in determining the comparison of 
Significance levels; when sampling from Erlang distribution 
with mean 2.0 using sample size 15, one would consult Appendix 
F, Table 16 and obtain the required information in the row 
marked 14. The asterisks on the observed significance values 
indicate that the results were found to be significantly 
different at 95% confidence level by the use of chi-square 
goodness-of-fit test. 
RESULTS OF ERLANG 

The results shown in Tables 4, 5, 6, and 7 indicate that 
the Erlang frequency distributions are significantly different 
than the N(0,1) at 5% level. A comparison of the observed 


Significance levels with the actual significance levels in 
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Table 14, Appendix F reveals that when an Erlang distribution 
with mean 1.333 1s used the observed alpha values are signifi- 
cantly different from the actual alpha values. For example, 
for Sample Size 5, corresponding to the actual values of a = 
eo, »2, and .05;: the observed values <= a= .034, .004, and 
.0004 respectively. It is further noted that as the sample 
size is increased to 10, 15, and 20, the corresponding observed 
values are zero in each case. This means that all the absolute 
values of t statistic are smaller than the critical values. 
From the results in Table 15, one observes that for 
sample size 5 at a = .5, .2, .05, the observed values are a = 
.0292, .002, and .0002 respectively. Each of these values is 
smaller than the corresponding values in Table 14. The results 
for sample size 10, 15, and 20 are the same as found in Table 
14. The results in Tables 16 and 17 indicate that for sample 
size 5, the corresponding observed values in each case are 
a = .0292, .0034, .0002 and .0268, .0028, .0002 respectively. 
iiemaleha values for sample sizes 10, 15, and 20 remain zero. 
In examining the graphs of these distributions, One Ob- 
serves that as the mean of the distribution is increased; the 
distribution, though flatter than the normal distribution, 
becomes more nearly symmetric around the mean. However as the 


mean of the distribution is increased, the variance ( x ) is 
r 


also increased. Consequently, when sampling Erom this distra. 


mimewon, the absolute values of the t statistic, aT eae 
Ss 
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smaller than the absolute values of this statistic when sampl- 
ing from the normal population. Furthermore, these values 
decrease as the sample size is increased. 

From the foregoing results, it is concluded that "Stu- 
dent's" t-test is sensitive to the Erlang GqistyriputloOn. hems 
further observed that when sampling from an Erlang distribu- 
tion, the resulting t statistic has significantly shorter tails 
than the "Student's" t distribution. It would seem, therefore, 
that the resulting t statistic is a form of leptokurtic curve. 
RESULTS OF BETA DISTRIBUTION 

Results in Tables 8, 9, 10, and 11 reveal that except 
when the sample size is 20; and the distribution is symmetric 
around the mean (.5), the Beta distribution frequencies are 
Significantly different from the N(0,1). A comparison of the 
results in Table 19, Appendix G reveals that when sampling 
from the Beta distribution with mean .5, the observed alpha 
values in most cases are not Significantly different from the 
actual alpha values. For example, for sample size 5, the 
observed alpha values are only significant at a = .8 and .5. 
However, when the sample size is increased to 10, 15, and 20; 
the results are only significant at a = .8. This suggests 
that when sampling from a symmetric Beta distribution, using 
samples of size 5crgreatey, the "Student's" t-test yields 
results that are compara»le with results obtained from normal 
sampling distributions. 

The results in Table 20 reveal that when sampling from 


the Beta distribution with mean .44, the observed alpha values 
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for sample size 10 remain unchanged. The observe alpha values 
uSing sample size 5 are significantly different at a = .8, 

-5, and .2. The observed alpha values for sample size 20 are 
suamaficant at a = .8, .5,8.1 and 205. sComparing isimilas 
results in Tables 21 and 22, one observes that as the mean of 
the sampling distribution is varied to the left of the mean, 
the use of larger sample sizes (10, 15) yield observed alpha 
values that are comparable to the actual alpha values. In 
Table 23, one observes that when the sampling distribution has 
mean .33; i.e., it iS Significantly skewed to the right, the 
observed alpha values for sample size 5 are significantly 
different than the actual alpha values. The results for 
sample size 10 are reasonably comparable with the actual alpha 
values. 

From the foregoing, it is concluded that the "Student's" 
t-test is sensitive to ee in the shape resulting from 
changes in the mean of the distribution. When the Beta distri- 
bution is symmetric around the mean, the use of "Student's" 
t-test yields results that are comparable with those obtained 
when sampling from the normal population. It is further noted 
that most of the observed alpha values are smaller than the 
actual alpha values. This suggests that the values of the t 
statistic are smaller than the corresponding values of 
"Student's" t statistic. It would seem that the tails of the 
new t statistic are shorter than the tails of "Student's" t. 

The results of the Beta distributions with fixed mean 


and different variances are contained in Tables 24, 25, 26, 
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27, and 28. A similar investigation of these results indicates 
that for small sample size (5) the observed alpha values are 
Significantly different than the true values. However, as 

the sample size iS increased to 10, the difference between the 
observed and actual alpha values is significantly reduced. 

This suggests that "Student's" t-test is sensitive to changes 


in the variance when small samples are used. 


29 








V. CONCLUSIONS AND RECOMMENDATIONS 

The simulation described on the foregoing pages which 
investigated the robustness of "Student's" t-test has proven 
to be extremely realistic. The results of violating the 
assumption of normality by the use of the Beta and the Gamma 
distributions indicate that when the sampling distribution is 
symmetric around the mean, the use of "Student's" t-test gives 
results that are closely comparable with the values obtained 
by the use of normal distributions. This implies that 
"Student's" t-test 1s reasonably robust when it is used with 
symmetric populations.- The results also indicate that in the 
case of skewed distributions, the observed alpha values are 
Significantly different from the actual significance levels. 
It is further noted that in the case of symmetric distributions, 
the values of observed significane levels approach the true 
values as the sample size i increased. 

In the case of symmetrical distributions with a fixed 
mean and different variances, it is observed that the "Stu- 
dent's" t-test is sensitive to the change in the variance. The 
observed significance levels are considerably different from 
the true values as the variance of the sampling population is 
increased. However, as the sample size is increased, the 


effects of change in the variance are considerably reduced. 
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Table 4 
POPULATION Eee 
SAMPLE SIZE. 5 


Bawt T Population Parameters 


Mean 


Index 





Part II Compan sens of Distribution Frequencies 
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Table 5 
POPULATION: Erlang 
SAMPLE SIZE: 10 
Part I Population Parameters 
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Rant. II Comparison of Distribution Frequencies 
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Table 6 
POPULATION: Erlang 
SAMPLE SIZE: 15 
Part I Population Parameters 












= 
STD e DEV @ 





S25. 


a ar ee ed 












Part YI Comparisons of Distribution Frequencies 
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APPENDIX C 


Table 7 
POPULATION: Erlang 
SeirPhE SIZE: 20 
Part I Population Parameters 
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APPENDIX D 


Table 8 
POPULATION: Beta 
SAMPLE SIZE: 5 
Part I Population Parameters 
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Part II Comparison of Distribution Frequencies 
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Table 9 
POPULATION: Beta 
SAMPLE SIZE: 10 
Part Jf Population Parameters 
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Table 10 
POPULATION: Beta 
SAMPLE SIZE: 15 


Part I Population Parameters 


Mean 
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Pare t 11 Comparisons of Distribution Frequencies 
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Table 11 
POPULATION: Beta 
SAMPLE SIZE: 20 
Pant I Population Parameters 
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Part II Comparison of Distribution Frequencies 
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Table 12 
POPULATION: Beta 


SAMPLE SIZE: 5 (Fixed Mean, Difference Variance) 


Part I Population Parameters 





Part II Comparison of Distribution Frequencies 
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Table 13 


POPULATION: Beta 


SAMPLE SIZE: 10 (Fixed Mean, Difference Variance) 


Beant I Population Parameters 
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Table 14 
POPULATION: Boe Fanig 
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Table 15 


POPULATION: Erlang 


POPULATION MEAN: 1.667 
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Table 16 


POPULATION: Erlang 


POPULATION MEAN: 2.0 
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Table 17 
POPULATION: Erlang 


POPULATION MEAN: 2.33 
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Table 18 
POPULATION: Erlang 


POPULATION MEAN: 2.667 
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APPENDIX G 
Table 19 
POPULATION: BETA 


POPULATION MEAN: — 


ACTUAL SIGNIFICANCE LEVELS 
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Table 20 


POPULATION: Beta 


POPULATION MEAN: 44 


ACTUAL SIGNIFICANCE LEVELS 
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APPENDIX G 
Table 21] 
POPULATION: Beta 


POPULATION MEAN: - 40 


ACTUAL SIGNIFICANCE LEVELS 
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Table 22 


POPULATION: Bera 


POPULATION MEAN: . 36 
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Table 23 


POPULATION: Beta 


POPULATION MEAN: .33 










ACTUAL SIGNIFICANCE LEVELS 
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Table 24 
POPULATION: Beta (Fixed Mean- Different Variance) 
POPULATION MEAN: ao 
POPULATION VARIANCE: .036 
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Per uULATION: 


POPULATION MEAN: 


POPULATION VARIANCE: 


Degrees 
of 
Freedom 


APPENDIX H 
Table 25 
Beta (Fixed Mean-Different Variance) 
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Table 26 


POPULATION: Beta (Fixed Mean-Different Variance) 


POPULATION MEAN: ~ 5 


POPULATION VARIANCE: .0195 
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Table 27 
POPULATION: Beta 
POPULATION MEAN: is 
POPULATION VARIANCE: Os 


ACTUAL SIGNIFICANCE LEVELS 
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POPULATION MEAN: 
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APPENDIX J 


Instructions for the Use of Computer Program 


The attached computer program was used to obtain the 
pamulation results for the Beta distribution. The program, 
written in Fortran language, makes use of an IBM 360 computer. 
This program consists of four simulations, requires 47 minutes 
of execution time, and makes use of 505 K storage capacity. 
The program as presented in Appendix J, is self sufficient in 
that it does not require any external sub-routines or programs 
for its execution. The program contains its own random number 
generator with the aes seed characteristics. It enables 
the user to manipulate the various parameters without any 
reorganization of the program. 

To assist users in understanding and utilizing eee 
facets of the program, a number of comment cards are used 
through-out the program. In order to illustrate the techniques 
of varying the parameters of the Erlang distribution or the 
number of cycles in the simulation, or the number of sanplles 
for the distributions, the following instructions could be 
useful. If one consults Appendix J, page 62, he should find 
that parameters B, = 3 corresponds to i = 3, and the values of 
K, and Ky represent the values of parameter K in the Erlang 
distribution. In order to obtain five different Erlang distri- 
butions in each cycle of each simulation (as illustrated in 


Table 2), one would set Ky = 4, K. = 3 and make use of the Do 


2 
loop 777. Thus in order to change the number of cycles to 3, 


one would simply replace Do 777 IB=1,5 with Do 777 IB=1,3. 


ao 





This program is designed to furnish 4 different sample 
sizes. This is accomplished by the first Do loop on page 62; 
1.e., Do 778 IN=1,4. The sample size is determined by the 
value of the increment in statement, MR = MR + 5. If four 
samples of sample size 2, 4, 6, and 8 are required, one would 
simply change the statement MR = MR + 2. If only two samples 
of 5000 each are desired, then one would replace Do 778 IN = 
1,4 by Do 778 IN = 1,2. The number of samples in each simula- 
tion are determined by the Do loop "Do 1000 II = 1,50. As is 
apparent from the dimensions of GAMMA] and GAMMA2, the program 
for given values of K, and K, generates 100 samples of size 
MR which are Beta distributed. It calculates the new t 
statistic, compares the absolute values of this statistic 
with the critical values, and stores the beta variate in 
location called "STOR". Utilizing a different seed, the entire 
process is repeated 50 times, thus yielding a total of 5000 
samples of size MR. After the generation of 5000 samples which 
are located in "STOR", the distribution frequency comparisons 
are performed; and the desired results are provided as indi- 
cated in Appendix J. The values of parameters are then 
incremented by the Do loop 778, and the entire process is 
mepeated until 5 cycles are completed for the value of sample 
Size. At the end of Simulation I, the sample size is incre- 
mented, and the entire process is repeated. 

A sample of the results of one simulation are furnished 


in Appendix J. It may be recalled that each simulation 
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consists of five cycles; for each cycle, the program furnishes 
the following information as shown in Appendix J: 

i. Transformed Beta frequency distribution denoted by a) : 
2 N(0,1) frequency distribution denoted by (2) : 

oe Sample size denoted by G3) : 

4. Distribution mean denoted by (4) ; 

Dis The actual levels of significance denoted by (5) ; 

5. The observed levels of significance denoted by (6) : 


7. The X2(1) statistic for each level of Significance 
SS 


denoted by (7) : . 
At the end of the simulation, a summary of the results is 
furnished which contains the actual alpha values, the observed 
ipa values for each of the five cycles, eae values for one 
feg@@ee Of freedom, and the observed a) Statistics for each 
Significance level. The computer program for the Gamma 
distribution also utilizes the same techniques. This program 


is furnished in Appendix J. 
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COMPUTER PROGRAM FOR ERLANG DISTRIBUTION 
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